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Abstract 

This paper is a strongly geometrical approach to the Fisher distance, which is a mea- 
sure of dissimilarity between two probability distribution functions. This, as well as 
other divergence measures, are also used in many applications to establish a proper 
data average. It focuses on statistical models of the normal probability distribution 
functions and takes advantage of the connection with the classical hyperbolic geome- 
try to derive closed forms for the Fisher distance in several cases. Connections with 
the well-known Kullback-Leibler divergence measure are also devised. The main 
purpose is to widen the range of possible interpretations and relations of the Fisher 
distance and its associated geometry for the prospective applications, in particular 
to information theory. 
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1 Introduction 



Information geometry is a research field that has provided framework and enlarged the 
perspective of analysis for a wide variety of domains, such as statistical inference, in- 
formation theory, mathematical programming, neurocomputing, to name a few. It is 
an outcome of the investigation of the differential geometric structure on manifolds of 
probability distributions, with the Riemannian metric defined by the Fisher information 
matrix pQ. Rao's pioneering work [16] was subsequently followed by several authors (e.g. 
[2J [T71 [12], among others). We quote [1] as a general reference for this matter. 

Concerning specifically to information theory and signal processing, an important 
aspect of the Fisher matrix arises from its trace being related to the surface area of the 
typical set associated with a given probability distribution, whereas the volume of this 
set is related to the entropy. This was used to establish connections between inequalities 
in information theory and geometric inequalities ([5], [8]). 

In general, many applications demand a measure of dissimilarity between the distri- 
butions of the involved objects, or also require the replacement of a set of data by a 
proper average or a centroid. In both cases, the Fisher distance may apply as well as 
other dissimilarity measures ([HI [TJ1 IT5] ) 

Our contribution in this paper is to present a geometrical view of the Fisher matrix, 
focusing on the parameters that describe the univariate and the multivariate normal dis- 
tributions, with the aim of widen the range of possible interpretations for the prospective 
applications of information geometry to information theory and other related fields. 

Our geometrical reading allowed to employ results from the classical hyperbolic ge- 
ometry and to derive closed expressions for the Fisher distance in special cases of the 
multivariate normal distributions. A preliminary version of some results presented here 
has appeared in [6]. 

This text is organized as follows: in Section[2]we explore the two dimensional statistical 
model of the Gaussian (normal) univariate probability distribution function (PDF). Closed 
forms for this distance are derived in the most common parameters and a relationship 
with the Kullback-Leibler measure of divergence is presented. Section [3] is devoted to 
the Fisher information geometry of the multivariate normal PDF's. For the special cases 
of the round Gaussian distributions and normal distributions with diagonal covariance 
matrices, closed forms for the distances are derived. We discuss the Fisher information 
matrix for the general bivariate well. 
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2 The hyperbolic model of the mean x standard de- 
viation half-plane 

The geometric model of the mean x standard deviation half-plane associates each point 
in the half upper plane of M 2 with a univariate Gaussian probability distribution function 

Hence, a classic parametric space for this family of PDF's is 

H = {0,cr) G R 2 | a > 0}. 




A distance between two points P = (/i^cri) and Q = (/i2,o" 2 ) i n the half-plane H 
should reflect the dissimilarity between the associated PDF's. We will not distinguish the 
notation of the point P in the parameter space and its associated PDF f(x, P). 

A comparison between univariate normal distributions is illustrated in Figure [T] By 
fixing the means and increasing the standard deviation, we can see that the dissimilarity 
between the probabilities attached to the same interval concerning the PDF's associated 
with C and D is smaller than the one between the PDF's associated with A and B (left). 
This means that the distance between points in the upper half-plane (right) representing 
normal distributions cannot be Euclidean. Moreover, we can observe that such a metric 
must vary with the inverse of the standard deviation a. The points C and D should be 
closer to each other than the points A and B, reflecting that the pair of distributions A 
and B is more dissimilar than the pair C and D. 

A proper distance arises from the Fisher information matrix, which is a measure of the 
amount of information of the location parameter ([7], ch. 12). For univariate distributions 
parametrized by an n-dimensional space, the coefficients of this matrix, which define a 
metric, are calculated as the expectation of a product involving partial derivatives of the 
logarithm of the PDF's: 
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A metric matrix G = (g^) defines an inner product as follows: 

(u,v) G = u T (gij)v and \\u\\ G = a/ (u,u) g . 

The distance between two points P, Q is given by the number which is the minimum 
of the lengths of all the piecewise smooth paths 7p joining these two points. The length 
of a path j(t) is calculated by using the inner product (■, -)g : 



Length of 7 



ds 



b 



G 



(It 



and so 



dc(P, Q) = min{Length of 7}. 



A curve that encompasses this shortest path is a geodesic. 
In the univariate normally distributed case described above we have (3 
(fi, cr) and it can be easily deduced that the Fisher information matrix is 







(1) 



so that the expression for the metric is 



ds 2 F 



d[i 2 + 2da 2 
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(2) 



The Fisher distance is the one associated with the Fisher information matrix Q. In 
order to express such a notion of distance and to characterize the geometry in the plane 
Hp, we analyze its analogies with the well-known Poincare half-plane H 2 , a model for 
the hyperbolic geometry, the metric of which is given by the matrix 
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The inner product associated with the Fisher matrix Q will be denoted by (-,-)f 
and the distance between P = (^1,01) and Q = ([i2,o~2) in the upper half-plane H F , 
by d F (P,Q). The distance in the Poincare half-plane induced by ^ will be denoted by 
dn(P,Q)- By considering the similarity mapping \1/ : Hp — > H 2 defined by ^(fx,a) = 
(ji/ y/2, a), we can see that 



d F ((/ii,cxi), (/i 2 ,o- 2 )) = V2d H ((^=,ai) , ^-^=, 



02 



(4) 
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Besides, the geodesies in H 2 F are the inverse image, by ^ of the geodesies in H 2 . 
Vertical half-lines and half-circles centered at a = are the geodesies in H 2 (see, eg. [31 
Ch.7]. Hence, the geodesies in IH 2 ^ are half-lines and half-ellipses centered at a = 0, with 
eccentricity 1 / \/2. We can also assert that a circle in the Fisher distance is an ellipse with 
the same eccentricity and its center is below the Euclidean center. Figure [2] shows the 
Fisher circle centered at A = (1.5,0.75) and radius 2.3769, and the geodesies connecting 
the center to points B, E and F on the circle. 
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Figure 2: A Fisher circle centered at A and geodesic arcs AB, AF and 
AE, with d F (A, B) = d F (A, F) = d F (A, E). 



The distance between two points in the Poincare half-plane can be expressed by the 
logarithm of the cross-ratio between these two points and the points at the infinite: 



d H (P,Q) = ln(P 0O ,P,Q,Q oo ). 

It can be stated by the following formulas, considering P and Q as vertical lined or not, 
as illustrated in Figure [3j respectively: 



d H {P,Q) =ln 



op 



or d H {P,Q) = ln 
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By recalling that the Fisher distance dp and the hyperbolic distance djj are related 
by Q we obtain the following closed expression for the Fisher information distance: 



([Mi, °2)) = V2\n 
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(5) 



(6) 
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Figure 3: Elements to compute the distance du(P,Q), in case the 
points P,QE H 2 are vertically aligned (left) or not (right). 
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Figure 4: Equidistant pairs in Fisher metric: dn{A, B) = cIh{C, D) = 
2.37687, where A = (1.5,0.75), B = (3.5,0.75) and C = (0.5,1.5), 
D = (4.5,1.5). 



where 

7"((//i,0-i),(//2,0- 2 )) = v 7 ((/"i - Vi) 2 + 2(oi - o- 2 ))((/ii - /i 2 ) 2 + 2(oi + a 2 )). 

Figure |4] illustrates two distinct pairs of Gaussian distributions which are equidistant 
with the Fisher metric. Moreover, from the relations (J5|-(j6]) we can deduce facts of the 
geometry of the upper half plane with the Fisher metric: it is hyperbolic with constant 
curvature equal to — \ and the shortest path between the representatives of two normal 
distributions is either on a vertical line or on a half ellipse (see Figure |6^a)). 

The Fisher distance between two PDF's P = (fi, ai) and Q = (/i, a 2 ) is 

d F {P,Q) = V2\\n{a 2 /a 1 )\ (7) 

and the vertical line connecting P and Q is a geodesic in the Fisher half-plane. On the 
other hand, the geodesic connecting P = (/xi,cr) and Q = (/i2,c) associated with two 
normal PDF's with the same variance is not the horizontal line connecting these points 
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(the shortest path is contained in a half-ellipse). Indeed, 

Wg) = ^lnf 4a2 + ^ 1 "^ )2 + l ^ 2lV8a2 + ( ^"^ -l < l/i2 " /i1 ' (8) 

The expression on the right of ^ is the length of the horizontal segment joining P and 
Q. Nevertheless, in case just normal PDF's with constant variance are considered, the 
expression on the right of ^ is a proper distance. 

It is worth mentioning that the Fisher metric can also be used to establish the concept 
of average distribution between two given distributions A and Q. This is determined by 
the point M on the geodesic segment joining A and Q and which is equidistant to these 
points in Figure |5j 

A. ' ■ . 

M .■' 

q! \ 



Figure 5: The Fisher average between distributions A = (1.5, .75) and 
Q = (1.0610, 0.1646) is M = (1.1400, 0.3711). The plotted points form 
a polygonal with equal Fisher length segments. 



2.1 Univariate normal distributions described in other usual pa- 
rameters 

Univariate normal distributions may be also described by means of the so-called source 
(Ai, A2) fix M + , natural (81, 8 2 ) G 1R x M_ and expectation parameters (rji, rj 2 ) elx R + , 
respectively defined by 

(Ai, A 2 ) = (/i,cr 2 ) 



h,8 2 



a 2 ' 2a 2 J 



and 

Therefore, 



(li, a) = (Ai, \ZA 2 ) 



(?7l,772) = (/i,cr 2 + /i 2 ) . 
■di 1 



28 2 ' yf=W 2 

and expressions ([5])-(j6]) may be restated, for the source parameters, as 
^f((Aii, x/Affl)) (A12, VA22)) = d\((\n, A21), (A12, A22)) = 



v^ln 



V(Xn ~ A12) 2 + 2(y^2T - v^) 2 + V(\n - A 12 ) 2 + 2(^X 2 ~ 1 + ggg 
\/(Aii - A 12 ) 2 + 2(v^2T - v^) 2 - aAAii - A 12 ) 2 + + y/X^Y 



for the natural parameters as 

_ \/ 4 (tSt ~ tSt) + (fc ~ fe) + \/ 4 (tSJ + tSt) + (fc ~~ fe) 

V \A (t^t ~ t^t) + (fc ~~ &) \/ 4 (tA^ + t^t) + (fc ~~ &) / 

and for the expectation parameters as 



dF{{mi, y/mi - Vn), (»7i2, ^22 - ?7? 2 )) = ^(Ofti, ??2i), (1712, = 







(b) 



(d) 



Figure 6: Shortest path between the normal distributions A and S in the distinct half- 
planes: (a) Classic parameters (fi, a) - mean x standard deviation; (b) Source parameters 
a 2 ) - mean x variance; (c) Natural parameters (#1, #2) = ^2) and (d) Expectation 
parameters (771,772) = (/U, Ai 2 + cr 2 ). 



The shortest path between two normal distributions is depicted in Figure [6] for the 
four distinct half-planes, described by the classic (a), the source (b), the natural (c) and 
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the expectation parameters (d). Besides the half-ellipse that contains the shortest path 
in the classic mean x standard deviation half-plane, the shortest path in the source mean 
x variance and in the expectation half-planes are described by arc of parabolas, whereas 
an arc of a half-hyperbola contains the shortest path in the natural half-plane. 



2.2 The Kullback-Leibler divergence and the Fisher distance 

Another measure of dissimilarity between two PDF's is the Kullback-Leibler divergence 
[TU] , which is used in information theory and commonly referred to as the relative entropy 
of a probability distribution. It is not a distance neither a symmetric measure. In what 
follows we discuss its relation with the Fisher distance in the case of univariate normal 
distributions. Its expression in this case is: 



#L((/il,CTl)||(>2,0-2)) = i 



A symmetrized version of this measure, 



^2 In 
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dKL({fJ>l,CTx), {fJ>2,(72)) = V^^^l' "!) 1 1 (^2,02)) + KL{{fM2,0- 2 )\\{Hl,(Tl)) 




2 + {»i ~ Afgj 2 + °\_ + {»i ~ /fg) 2 °l 
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is also used. 

If the points in the parameter space are vertically aligned (P = (//, <7i) and Q = 
(/i, cr 2 )), the Fisher distance d = dp{P, Q) = \/2 ln(^) from what we get an expression of 
the Kullback-Leibler divergences in terms of the Fisher distance: 



KL{P\\Q) = g{d) = - (e-^ d + 2 In (e^ - 1 



e V2d e -V2d 



KL(Q\\P) = g{-d) and d KL (P,Q) = \ ^ 1 = Jcosh(V2rf) - 1. 



Figure [7j (left) shows the graphics of the mappings g{d) = KL{A\\Y) (red continuous 
curve), g{—d) = KL{Y\\A) (blue dashed curve), and the symmetrized dxL{A, Y) (green 
dot-dashed curve) when Y goes from A to F in Figure [2j compared to the Fisher distance 
d (identity), which varies in the interval [0,2.3769]. 

It is straightforward in this case to prove that the symmetrized Kullback-Leibler ap- 
proaches the Fisher distance for small d. In fact, this result is more general, it also holds 
for multivariate normal distributions when P approaches Q in the parameter space jl]. 

Figure [7] (right) displays the graphics of the mappings i^L(A||F) (red continuous 
curve), A'L(F||y4) (blue dashed curve), and the symmetrized dxL(A, Y) (green dot-dashed 
curve), compared to the Fisher distance d (identity) varying in the interval [0, 2.3769], with 
Y going from A to B along the geodesic path of Figure [2] 



9 



Figure 7: Kullback-Leibler divergences compared to the Fisher dis- 
tance along the geodesies of Figure [2] connecting the PDF's A to F 
(left) and A to B (right). 



3 Fisher information geometry of multivariate 
normal distributions 

For more general p-variate PDF's, defined by an n-dimensional parameter space, the 
coefficients of the Fisher matrix are given by 

MP 

The previous analysis can be extended to independent p-variate normal distributions: 
f{x, fi, £) = (2tt)^ (det £)^ exp (^(x - fifE^x ~ A*)) , 

where 

X (^-1) %2i • • • j %p) ; 

\x = (/ii, /i2, • ■ • , p* P ) T (mean vector) and 

£ is the covariance matrix (symmetric positive definite p x p matrix). 

Note that, for general multivariate normal distributions, the parameter space has 
dimension n = p + p(p + l)/2. 

3.1 Round Gaussian distributions 

If S = a I (scalar covariance matrix), the set of all such distributions can be identified 
with the half (p + l)-dimensional space, H^ +1 , parametrized by (5 = (yUi,/i 2 , . . . ,fi p ,a) 
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and the Fisher information matrix is: 



\ 







2_ 

a 2 - 



We have again similarity with the matrix of the Poincare model metric in the (p + 1)- 
dimensional half space H p+1 , 



^ 



\ . 

(7 -• 



and the similarity transformation 

V : H£ +1 — > H p+ \ /i 2 , . . . , /i p , a) = (m/y/2, fi 2 /V2 fi p /V2, a). 



For fi 1 = (/in,/ii2, • • • e /j, 2 = (At 2 i,/i22, ■ ■ ■ , H2 P ) we have a closed form for the 
Fisher distance between the respective Gaussian PDF's: 



V2 



0~2 



(Ir Aifi^.o ). i/t 2 .n 2 )) = \'-2d„ \ \ — .a. 



dF,r((A*i>^i)>(/*2> ff 2)) = v^ln 



where | • | is the standard Euclidean vector norm and the subindex r stands for round 
distributions. 

The geodesies in the parameter space (fi, a) between two round p-variate Gaussian 
distributions are contained in planes orthogonal to the hyperplane a = 0, and are either 
a line (/x = constant) or a half ellipse with eccentricity \/2, centered at this hyperplane. 
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3.2 Diagonal Gaussian distributions 

For general S = diag (of, of, . . . , a^) (diagonal covariance matrix),^ > 0, Vi, the set of 
all independent multivariate normal distributions is parametrized by an intersection of 
half-spaces in M. 2p ((3 = (/ii, a±, ^2, &2, ■ ■ ■ , fJ> P , o- p ) 1 a i > 0) so the Fisher information matrix 
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is: 
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We can show that, in this case, the metric is a product metric on the space H^ p and 
therefore we have the following closed form for the Fisher distance between the respective 
Gaussian PDFs: 



dF,d((Hll, 011, • • • j fJ-lp, 01p), ((^21, 021, • • • > A*2p, 02p)) 
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that is, 

dF,d((flU, 011, • • • , Mlp, 01p), ((^21, 021, • • • , A t 2p, 02p)) 
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(9) 



(10) 



where | • | is the standard Euclidean vector norm and the subindex d stands for diagonal 
distributions. 

These matrices induce a metric of constant negative mean curvature (i.e. a hyperbolic 
metric) which is equal to — \ in case 3.1 and to ~ 2 (2p-i) * n case ^-2 Expressions for 



the distance and other geometric properties can be deduced using results on product of 
Riemannian manifolds and relations with Poincare models for hyperbolic spaces. 



3.3 General Gaussian distributions 

For general p-variate normal distributions (given by any symmetric positive definite covari- 
ance matrices) the analysis is much more complex as pointed out in [2] and far from being 
fully developed. From the Riemannian geometry viewpoint this is due to the fact that not 
all the sectional curvatures of their natural parameter space (which is a (p + p(p + l)/2)- 
dimensional manifold) provided with the Fisher metric are constant. As an example, 
for p = 2 we may parametrize the general (elliptical) 2-variate normal distributions by 
(3 = (a%, (72, Hx-, 1^2, u) where o\,a\ are the eigenvalues and u the turning angle of the 
eigenvectors of S. The level sets of a pair of such PDF's are families of rotated ellipses, 
see Figure [8] 
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The Fisher matrix which induces the distance in this parameter space can be deduced 



as 
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We could not derive a general closed form for the associated Fisher distance in this 
parameter space. Here, like in most multivariate cases, numerical approaches must be 
used to estimate the Fisher distance. In these approaches, the symmetrized Kullback- 
Leibler can be used to estimate the Fisher distance between nearby points in the parameter 
space [I]. 

A special instance of this bivariate model is given by the set of points with fixed means 
fix, /i2 and turning angle u = 0. Using the characterization of geodesies as solutions of a 
second order differential equation [9] , we can assert that this two-dimensional submanifold 
is totally geodesic (i.e. all the geodesies between two of such points are contained in this 



submanifold). Therefore, the Fisher distance can be calculated as in 3.2 



g?f((cu, cr i2, A*i, yU2, 0), (cr 2 i, cr 22 , y"2, 0)) 




In 



q~2i 

022, 



Figure 8: Bivariate normal distributions: level sets (left) and repre- 
sentation in the upper half-space (right). 

If we consider the (p(p + l)/2-dimensional statistical model of p-variate normal PDF's 
with fixed mean ii and general covariance matrix E, the induced Fisher distance can be 
deduced as [2| 

4((M,E 1 ),( M ,S 2 )) = ^(lnA i ) 2 , (11) 
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where Xj are the eigenvalues of matrix (£ 1 )~ 1 £ 2 (i-e. Xj are the roots of the equation 
det((S 1 ) _1 S 2 — XI) = 0). Note that, for p — 1, the expression (11) reduces to Q. 

Moreover, by restricting (11) to the set of distributions with diagonal covariance ma- 
trices, the induced metric is the same as the metric from the case 3^2 restricted to distri- 
butions with fixed mean /j,. 



4 Final remarks 

We have presented a geometrical view of the Fisher distance, focusing on the parameters 
that describe the normal distributions, to widen the range of possible interpretations for 
the prospective applications of information geometry, in particular to information theory. 

By exploring the two dimensional statistical model of the Gaussian (normal) univariate 
PDF, we have employed results from the classical hyperbolic geometry to derive closed 
forms for the Fisher distance in the most commonly used parameters. A relationship 
with the Kullback-Leibler measure of divergence was presented as well. The multivariate 
normal PDF's were also analyzed from the geometrical standpoint and closed forms for 
the Fisher distance were devised in special instances. 
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